Overview

Dataset statistics

Number of variables15
Number of observations163994
Missing cells6186
Missing cells (%)0.3%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory18.8 MiB
Average record size in memory120.0 B

Variable types

NUM9
CAT6

Reproduction

Analysis started2021-03-22 13:47:55.167736
Analysis finished2021-03-22 13:48:59.367501
Duration1 minute and 4.2 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 6 (< 0.1%) duplicate rows Duplicates
Employment_Years has 5811 (3.5%) missing values Missing
Annual_Income is highly skewed (γ1 = 35.49006707) Skewed
Delinquent_2yr has 139459 (85.0%) zeros Zeros

Variables

Loan_Amount
Real number (ℝ≥0)

Distinct count1274
Unique (%)0.8%
Missing7
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean13074.169141456336
Minimum500.0
Maximum35000.0
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum500
5-th percentile3000
Q17000
median11325
Q318000
95-th percentile30000
Maximum35000
Range34500
Interquartile range (IQR)11000

Descriptive statistics

Standard deviation7993.556189
Coefficient of variation (CV)0.6114007018
Kurtosis0.2289414268
Mean13074.16914
Median Absolute Deviation (MAD)5125
Skewness0.87534248
Sum2143993775
Variance63896940.54
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10000117957.2%
 
1200091645.6%
 
1500075984.6%
 
2000068644.2%
 
800058603.6%
 
600058303.6%
 
500055823.4%
 
3500044782.7%
 
1600040242.5%
 
1800036682.2%
 
Other values (1264)9912460.4%
 
ValueCountFrequency (%) 
50011< 0.1%
 
5501< 0.1%
 
6006< 0.1%
 
7003< 0.1%
 
7251< 0.1%
 
ValueCountFrequency (%) 
3500044782.7%
 
349756< 0.1%
 
349002< 0.1%
 
348752< 0.1%
 
348502< 0.1%
 

Term
Categorical

Distinct count2
Unique (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
36
129950
60
34037
ValueCountFrequency (%) 
3612995079.2%
 
603403720.8%
 
(Missing)7< 0.1%
 

Length

Max length4
Median length4
Mean length3.999957316
Min length3

Interest_Rate
Real number (ℝ≥0)

Distinct count512
Unique (%)0.3%
Missing7
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean13.715904065566173
Minimum5.42
Maximum26.06
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum5.42
5-th percentile6.62
Q110.65
median13.49
Q316.32
95-th percentile21.67
Maximum26.06
Range20.64
Interquartile range (IQR)5.67

Descriptive statistics

Standard deviation4.391939871
Coefficient of variation (CV)0.3202078295
Kurtosis-0.3207745705
Mean13.71590407
Median Absolute Deviation (MAD)2.84
Skewness0.3278648857
Sum2249229.96
Variance19.28913583
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12.1251823.2%
 
13.1146272.8%
 
7.945892.8%
 
8.945162.8%
 
15.3134942.1%
 
16.2934582.1%
 
10.9934542.1%
 
14.3333212.0%
 
6.0332672.0%
 
11.1432522.0%
 
Other values (502)12482776.1%
 
ValueCountFrequency (%) 
5.425730.3%
 
5.794050.2%
 
5.9311< 0.1%
 
5.993470.2%
 
634< 0.1%
 
ValueCountFrequency (%) 
26.0670< 0.1%
 
25.9979< 0.1%
 
25.891170.1%
 
25.831510.1%
 
25.82060.1%
 

Employment_Years
Real number (ℝ≥0)

MISSING

Distinct count11
Unique (%)< 0.1%
Missing5811
Missing (%)3.5%
Infinite0
Infinite (%)0.0%
Mean5.729389378125336
Minimum0.5
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum0.5
5-th percentile0.5
Q12
median6
Q310
95-th percentile10
Maximum10
Range9.5
Interquartile range (IQR)8

Descriptive statistics

Standard deviation3.541944849
Coefficient of variation (CV)0.618206342
Kurtosis-1.518433619
Mean5.729389378
Median Absolute Deviation (MAD)4
Skewness-0.05799236246
Sum906292
Variance12.54537332
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
104718328.8%
 
2157669.6%
 
0.5142488.7%
 
3136118.3%
 
5123477.5%
 
1114147.0%
 
4110246.7%
 
6100006.1%
 
790795.5%
 
874244.5%
 
ValueCountFrequency (%) 
0.5142488.7%
 
1114147.0%
 
2157669.6%
 
3136118.3%
 
4110246.7%
 
ValueCountFrequency (%) 
104718328.8%
 
960873.7%
 
874244.5%
 
790795.5%
 
6100006.1%
 

Home_Ownership
Categorical

Distinct count6
Unique (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
MORTGAGE
79714
RENT
70526
OWN
 
13560
OTHER
 
156
NONE
 
30
ValueCountFrequency (%) 
MORTGAGE7971448.6%
 
RENT7052643.0%
 
OWN135608.3%
 
OTHER1560.1%
 
NONE30< 0.1%
 
ANY1< 0.1%
 
(Missing)7< 0.1%
 

Length

Max length8
Median length4
Mean length5.862531556
Min length3

Annual_Income
Real number (ℝ≥0)

SKEWED

Distinct count14112
Unique (%)8.6%
Missing11
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean71915.670519749
Minimum1896.0
Maximum7141778.0
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1896
5-th percentile27000
Q145000
median61000
Q385000
95-th percentile145000
Maximum7141778
Range7139882
Interquartile range (IQR)40000

Descriptive statistics

Standard deviation59070.91565
Coefficient of variation (CV)0.8213914329
Kurtosis3267.743467
Mean71915.67052
Median Absolute Deviation (MAD)19217
Skewness35.49006707
Sum1.17929474e+10
Variance3489373076
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6000063253.9%
 
5000053793.3%
 
6500044152.7%
 
4000043932.7%
 
4500042082.6%
 
7000041722.5%
 
7500039402.4%
 
8000038322.3%
 
5500037292.3%
 
9000029161.8%
 
Other values (14102)12067473.6%
 
ValueCountFrequency (%) 
18961< 0.1%
 
20001< 0.1%
 
30001< 0.1%
 
33001< 0.1%
 
35001< 0.1%
 
ValueCountFrequency (%) 
71417781< 0.1%
 
61000001< 0.1%
 
60000001< 0.1%
 
50000001< 0.1%
 
49000001< 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
VERIFIED - income
60875
not verified
59155
VERIFIED - income source
43957
ValueCountFrequency (%) 
VERIFIED - income6087537.1%
 
not verified5915536.1%
 
VERIFIED - income source4395726.8%
 
(Missing)7< 0.1%
 

Length

Max length24
Median length17
Mean length17.07211239
Min length3

Loan_Purpose
Categorical

Distinct count14
Unique (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
debt_consolidation
93261
credit_card
30792
other
 
10492
home_improvement
 
9872
major_purchase
 
4686
Other values (9)
 
14884
ValueCountFrequency (%) 
debt_consolidation9326156.9%
 
credit_card3079218.8%
 
other104926.4%
 
home_improvement98726.0%
 
major_purchase46862.9%
 
small_business38412.3%
 
car28421.7%
 
medical20291.2%
 
wedding17511.1%
 
moving14640.9%
 
Other values (4)29571.8%
 

Length

Max length18
Median length18
Mean length14.71852629
Min length3

State
Categorical

Distinct count50
Unique (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
CA
28702
NY
 
14285
TX
 
12128
FL
 
11396
NJ
 
6457
Other values (45)
91019
ValueCountFrequency (%) 
CA2870217.5%
 
NY142858.7%
 
TX121287.4%
 
FL113966.9%
 
NJ64573.9%
 
IL60993.7%
 
PA54273.3%
 
VA52823.2%
 
GA51893.2%
 
OH48963.0%
 
Other values (40)6412639.1%
 

Length

Max length3
Median length2
Mean length2.000042684
Min length2

Debt_to_Income
Real number (ℝ≥0)

Distinct count3735
Unique (%)2.3%
Missing7
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean15.881530121290105
Minimum0.0
Maximum39.99
Zeros270
Zeros (%)0.2%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile3.79
Q110.23
median15.62
Q321.26
95-th percentile29.02
Maximum39.99
Range39.99
Interquartile range (IQR)11.03

Descriptive statistics

Standard deviation7.587668224
Coefficient of variation (CV)0.4777668251
Kurtosis-0.523370475
Mean15.88153012
Median Absolute Deviation (MAD)5.51
Skewness0.1821600668
Sum2604364.48
Variance57.57270908
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02700.2%
 
16.81520.1%
 
14.41410.1%
 
19.21400.1%
 
181350.1%
 
121340.1%
 
15.61290.1%
 
13.21270.1%
 
20.41260.1%
 
21.61230.1%
 
Other values (3725)16251099.1%
 
ValueCountFrequency (%) 
02700.2%
 
0.016< 0.1%
 
0.028< 0.1%
 
0.033< 0.1%
 
0.045< 0.1%
 
ValueCountFrequency (%) 
39.991< 0.1%
 
39.931< 0.1%
 
39.882< 0.1%
 
39.851< 0.1%
 
39.842< 0.1%
 

Delinquent_2yr
Real number (ℝ≥0)

ZEROS

Distinct count19
Unique (%)< 0.1%
Missing36
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.2273570060625282
Minimum0.0
Maximum29.0
Zeros139459
Zeros (%)85.0%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum29
Range29
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6941679229
Coefficient of variation (CV)3.05320665
Kurtosis72.66758384
Mean0.2273570061
Median Absolute Deviation (MAD)0
Skewness5.960591595
Sum37277
Variance0.4818691052
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
013945985.0%
 
11715810.5%
 
246352.8%
 
314880.9%
 
45790.4%
 
53100.2%
 
61440.1%
 
768< 0.1%
 
842< 0.1%
 
926< 0.1%
 
Other values (9)49< 0.1%
 
(Missing)36< 0.1%
 
ValueCountFrequency (%) 
013945985.0%
 
11715810.5%
 
246352.8%
 
314880.9%
 
45790.4%
 
ValueCountFrequency (%) 
291< 0.1%
 
183< 0.1%
 
161< 0.1%
 
152< 0.1%
 
144< 0.1%
 

Revolving_Cr_Util
Real number (ℝ≥0)

Distinct count1170
Unique (%)0.7%
Missing200
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean54.07917280242256
Minimum0.0
Maximum150.7
Zeros1562
Zeros (%)1.0%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile8.7
Q135.6
median55.8
Q374.2
95-th percentile92.5
Maximum150.7
Range150.7
Interquartile range (IQR)38.6

Descriptive statistics

Standard deviation25.28536677
Coefficient of variation (CV)0.4675620106
Kurtosis-0.8046268733
Mean54.0791728
Median Absolute Deviation (MAD)19.2
Skewness-0.2489493863
Sum8857844.03
Variance639.3497725
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
015621.0%
 
632710.2%
 
622710.2%
 
532700.2%
 
582660.2%
 
70.12630.2%
 
61.32610.2%
 
70.82590.2%
 
652570.2%
 
572570.2%
 
Other values (1160)15985797.5%
 
ValueCountFrequency (%) 
015621.0%
 
0.011< 0.1%
 
0.031< 0.1%
 
0.041< 0.1%
 
0.051< 0.1%
 
ValueCountFrequency (%) 
150.71< 0.1%
 
129.41< 0.1%
 
128.11< 0.1%
 
120.21< 0.1%
 
1191< 0.1%
 

Total_Accounts
Real number (ℝ≥0)

Distinct count96
Unique (%)0.1%
Missing36
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean24.57973383427463
Minimum1.0
Maximum118.0
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1
5-th percentile8
Q116
median23
Q331
95-th percentile46
Maximum118
Range117
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.68519037
Coefficient of variation (CV)0.4753993857
Kurtosis0.6264591488
Mean24.57973383
Median Absolute Deviation (MAD)8
Skewness0.7671960425
Sum4030044
Variance136.5436739
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2059703.6%
 
2159473.6%
 
2258243.6%
 
2358163.5%
 
1758003.5%
 
1957453.5%
 
1857243.5%
 
2455693.4%
 
1654873.3%
 
2554103.3%
 
Other values (86)10666665.0%
 
ValueCountFrequency (%) 
121< 0.1%
 
249< 0.1%
 
33230.2%
 
47820.5%
 
511340.7%
 
ValueCountFrequency (%) 
1181< 0.1%
 
1021< 0.1%
 
992< 0.1%
 
951< 0.1%
 
941< 0.1%
 

Bad_Loan
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
GOOD
133978
BAD
30016
ValueCountFrequency (%) 
GOOD13397881.7%
 
BAD3001618.3%
 

Length

Max length4
Median length4
Mean length3.816968913
Min length3

Longest_Credit_Length
Real number (ℝ≥0)

Distinct count63
Unique (%)< 0.1%
Missing36
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean14.854273655448347
Minimum0.0
Maximum65.0
Zeros11
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile6
Q110
median14
Q318
95-th percentile28
Maximum65
Range65
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.947732923
Coefficient of variation (CV)0.4677261968
Kurtosis1.961383604
Mean14.85427366
Median Absolute Deviation (MAD)4
Skewness1.131950456
Sum2435477
Variance48.27099276
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12127567.8%
 
13125407.6%
 
11118897.2%
 
14112876.9%
 
1597786.0%
 
1096495.9%
 
1684175.1%
 
1777194.7%
 
976494.7%
 
870324.3%
 
Other values (53)6524239.8%
 
ValueCountFrequency (%) 
011< 0.1%
 
167< 0.1%
 
21000.1%
 
39140.6%
 
424771.5%
 
ValueCountFrequency (%) 
651< 0.1%
 
612< 0.1%
 
602< 0.1%
 
591< 0.1%
 
583< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Loan_AmountTermInterest_RateEmployment_YearsHome_OwnershipAnnual_IncomeVerification_StatusLoan_PurposeStateDebt_to_IncomeDelinquent_2yrRevolving_Cr_UtilTotal_AccountsBad_LoanLongest_Credit_Length
05000.036.010.6510.0RENT24000.0VERIFIED - incomecredit_cardAZ27.650.083.79.0GOOD26.0
12500.060.015.270.5RENT30000.0VERIFIED - income sourcecarGA1.000.09.44.0BAD12.0
22400.036.015.9610.0RENT12252.0not verifiedsmall_businessIL8.720.098.510.0GOOD10.0
310000.036.013.4910.0RENT49200.0VERIFIED - income sourceotherCA20.000.021.037.0GOOD15.0
45000.036.07.903.0RENT36000.0VERIFIED - income sourceweddingAZ11.200.028.312.0GOOD7.0
53000.036.018.649.0RENT48000.0VERIFIED - income sourcecarCA5.350.087.54.0GOOD4.0
65600.060.021.284.0OWN40000.0VERIFIED - income sourcesmall_businessCA5.550.032.613.0BAD7.0
75375.060.012.690.5RENT15000.0VERIFIED - incomeotherTX18.080.036.53.0BAD7.0
86500.060.014.655.0OWN72000.0not verifieddebt_consolidationAZ16.120.020.623.0GOOD13.0
912000.036.012.6910.0OWN75000.0VERIFIED - income sourcedebt_consolidationCA10.780.067.134.0GOOD22.0

Last rows

Loan_AmountTermInterest_RateEmployment_YearsHome_OwnershipAnnual_IncomeVerification_StatusLoan_PurposeStateDebt_to_IncomeDelinquent_2yrRevolving_Cr_UtilTotal_AccountsBad_LoanLongest_Credit_Length
16398411975.036.022.992.0RENT40000.0VERIFIED - incomesmall_businessTX25.320.08.614.0GOOD6.0
1639854000.036.06.9910.0MORTGAGE58000.0VERIFIED - income sourcedebt_consolidationWI1.030.00.518.0GOOD19.0
1639862000.036.08.198.0MORTGAGE31000.0VERIFIED - income sourcecredit_cardCA8.450.016.948.0GOOD11.0
1639877000.036.013.6610.0RENT48681.0not verifieddebt_consolidationNY10.851.058.141.0GOOD20.0
16398826500.036.023.998.0MORTGAGE170000.0VERIFIED - income sourcesmall_businessNJ5.890.023.69.0GOOD6.0
16398915000.060.012.393.0MORTGAGE45000.0not verifiedcredit_cardOK31.444.075.834.0GOOD20.0
16399020000.036.014.9910.0OWN80000.0VERIFIED - incomehome_improvementVA23.650.068.818.0GOOD22.0
16399112825.036.017.146.0MORTGAGE38000.0not verifieddebt_consolidationTX9.030.070.724.0GOOD9.0
16399227650.060.021.990.5RENT60000.0VERIFIED - income sourcecredit_cardNY10.101.061.220.0GOOD6.0
16399317000.060.015.9910.0MORTGAGE63078.0VERIFIED - income sourcedebt_consolidationPA31.700.054.028.0GOOD16.0